AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Luis Haug, Sebastian Tschiatschek, Adish Singla

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Neural Information Processing SystemsFeb-12-2026, 18:17:42 GMT

Weintroduceanaturalquantity,the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms basedoninversereinforcement learning. Basedonthesefindings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimalpolicy.

machine learning, reinforcement learning, teaching risk, (18 more...)

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Neural Information Processing SystemsNov-20-2025, 22:08:44 GMT

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Learning near-optimal behaviour from an expert's demonstrations typically relies on the assumption that the learner knows the features that the true reward function depends on. In this paper, we study the problem of learning from demonstrations in the setting where this is not the case, i.e., where there is a mismatch between the worldviews of the learner and the expert. We introduce a natural quantity, the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms based on inverse reinforcement learning. Based on these findings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimal policy.

feature and demonstration, name change, teaching inverse reinforcement learner, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Luis Haug, Sebastian Tschiatschek, Adish Singla

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Neural Information Processing SystemsNov-20-2025, 16:08:56 GMT

Neural Information Processing Systems http://nips.cc/

learner, machine learning, reinforcement learning, (17 more...)

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Neural Information Processing SystemsFeb-14-2020, 20:13:51 GMT

Learning near-optimal behaviour from an expert's demonstrations typically relies on the assumption that the learner knows the features that the true reward function depends on. In this paper, we study the problem of learning from demonstrations in the setting where this is not the case, i.e., where there is a mismatch between the worldviews of the learner and the expert. We introduce a natural quantity, the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms based on inverse reinforcement learning. Based on these findings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimal policy.

feature and demonstration, near-optimal policy, teaching inverse reinforcement learner, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

arXiv.org Machine LearningMay-7-2019

Collaborative and Privacy-Preserving Machine Teaching via Consensus Optimization

Han, Yufei, Ma, Yuzhe, Gates, Christopher, Roundy, Kevin, Shen, Yun

In this work, we define a collaborative and privacy-preserving machine teaching paradigm with multiple distributed teachers. We focus on consensus super teaching. It aims at organizing distributed teachers to jointly select a compact while informative training subset from data hosted by the teachers to make a learner learn better. The challenges arise from three perspectives. First, the state-of-the-art pool-based super teaching method applies mixed-integer non-linear programming (MINLP) which does not scale well to very large data sets. Second, it is desirable to restrict data access of the teachers to only their own data during the collaboration stage to mitigate privacy leaks. Finally, the teaching collaboration should be communication-efficient since large communication overheads can cause synchronization delays between teachers. To address these challenges, we formulate collaborative teaching as a consensus and privacy-preserving optimization process to minimize teaching risk. We theoretically demonstrate the necessity of collaboration between teachers for improving the learner's learning. Furthermore, we show that the proposed method enjoys a similar property as the Oracle property of adaptive Lasso. The empirical study illustrates that our teaching method can deliver significantly more accurate teaching results with high speed, while the non-collaborative MINLP-based super teaching becomes prohibitively expensive to compute.

learner, teaching, teaching method, (15 more...)

arXiv.org Machine Learning

1905.02796

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report (0.65)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.91)

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Neural Information Processing SystemsDec-31-2018

Learning near-optimal behaviour from an expert's demonstrations typically relies on the assumption that the learner knows the features that the true reward function depends on. In this paper, we study the problem of learning from demonstrations in the setting where this is not the case, i.e., where there is a mismatch between the worldviews of the learner and the expert. We introduce a natural quantity, the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms based on inverse reinforcement learning. Based on these findings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimal policy.

learner, machine learning, reinforcement learning, (18 more...)

Country:

North America (0.46)
Europe (0.28)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Neural Information Processing SystemsDec-31-2018

Learning near-optimal behaviour from an expert's demonstrations typically relies on the assumption that the learner knows the features that the true reward function depends on. In this paper, we study the problem of learning from demonstrations in the setting where this is not the case, i.e., where there is a mismatch between the worldviews of the learner and the expert. We introduce a natural quantity, the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms based on inverse reinforcement learning. Based on these findings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimal policy.

learner, machine learning, reinforcement learning, (18 more...)

Country:

North America (0.46)
Europe (0.46)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Teaching Inverse Reinforcement Learners via Features and Demonstrations